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It has been shown that the communities of complex networks often overlap with each other. How- 
ever, there is no effective method to quantify the overlapping community structure. In this paper, 
we propose a metric to address this problem. Instead of assuming that one node can only belong 
to one community, our metric assumes that a maximal clique only belongs to one community. In 
this way, the overlaps between communities are allowed. To identify the overlapping community 
structure, we construct a maximal clique network from the original network, and prove that the 
optimization of our metric on the original network is equivalent to the optimization of Newman's 
modularity on the maximal clique network. Thus the overlapping community structure can be iden- 
tified through partitioning the maximal clique network using any modularity optimization method. 
The effectiveness of our metric is demonstrated by extensive tests on both the artificial networks and 
the real world networks with known community structure. The application to the word association 
network also reproduces excellent results. 

PACS numbers: 89.75.Fb, 89. 75. He 



I. INTRODUCTION 

Many complex systems in nature and society can be 
described in terms of networks or graphs. The study of 
networks is crucial to understand both the structure and 
the function of these complex systems P, [1] • A common 
feature of complex networks is community structure, i.e., 
the existence of groups of nodes such that nodes within 
a group are much more connected to each other than to 
the rest of the network. Communities reflect the locality 
of the topological relationships between the elements of 
the target systems , and may shed light on the relation 
between the structure and the function of complex net- 
works. Take the World Wide Web as an example, closely 
hyperlinked web pages form a community and they often 
talk about related topics Q- 

The identification of community structure has at- 
tracted much attention from various scientific fields. 
Many methods have been proposed and applied success- 
fully to some s pec ific complex networks d, d, 0, H, H, 
[13, [H d, In order to quantify the commu- 

nity structure of networks, Newman and Girvan [GJ pro- 
posed the modularity as a measure of a partition of net- 
work, in which each node only belongs to one commu- 
nity. The proposal of modularity has prompted the de- 
tection of community structure. However, the modular- 
ity faces several problems. For ex ample, the modularity 
suffers a resolution limit problem [1^, Furthermore, 
the modularity-based methods cannot tackle overlapping 
community structure, in which one node may belong to 
more than one community. Figure [1] shows an example 
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FIG. 1: A schematic network with overlapping community 
structure. Communities are differentiated by colors and the 
overlapping regions are emphasized in red. The edges between 
communities are colored in gray. 



network with overlapping community structure, fntu- 
itively, overlapping community structure can be repre- 
sented by a cover of network. A cover of network is de- 
fined as a set of clusters such that each node is assigned 
to one or more clusters and no cluster is a proper sub- 
set of any other cluster. As to the network in figure [U 
the overlapping community structure can be represented 
by the cover {{1,2,3,4,5,6}, {3,7,8,9,10,11,12,13}, 
{10, 11, 12, 14, 15, 16, 17}, {18, 19, 20, 21, 22, 23, 24}}. 

Overlapping community structure has been widely 
studied [B, 111, lii, H ill [13, il [13, ii]. in [13, the 
community structure is uncovered by fc-clique percolation 
and the overlaps between communities are guaranteed by 
the fact that one node can participate in more than one 
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clique. However, the fc-clique method gives rise to an 
uncomplete cover of network, i.e., some nodes may not 
belong to any community. In addition, the hierarchical 
structure can not be revealed for a given k. In [24!|, by 
introducing the concept of the belonging coefficients of 
each node to its communities, the authors proposed a 
general framework for extending the traditional modu- 
larity to quantify overlapping community structure. The 
method provides a new idea to find overlapping commu- 
nity structure. However, the physical meaning of the 
belonging coefficient lacks a clear explanation. Further- 
more, the framework is hard to extend to large scale net- 
works since it is difficult to find an efficient algorithm to 
search the huge solution space. Recently, Evans et al 
proposed a method to identify the overlapping commu- 
nity structure by partitioning a line graph constructed 
from the original network. This method only allows the 
communities to overlap at nodes. 

In this paper, a measure for the quality of a cover is 
proposed to quantify the overlapping community struc- 
ture referred as Qc (quality of a cover) . With the measure 
Qc, the overlapping community structure can be identi- 
fied by finding an optimal cover, i.e., the one with the 
maximum Qc- The Qc is based on a maximal clique view 
of the original network. A maximal clique is a clique (i.e. 
a complete subgraph) which is not a subset of any other 
clique in a graph. The maximal clique view is accord- 
ing to a reasonable assumption that a maximal clique 
cannot be shared by two communities due to that it is 
highly connective. To find an optimal cover, we con- 
struct a maximal clique network from the original net- 
work. We then prove that the optimization of Qc on 
the original network is equivalent to the optimization of 
the modularity on the maximal clique network. Thus 
the overlapping community structure can be identified 
through partitioning the maximal clique network with 
an efficient modularity optimization algorithm, e.g., the 
fast unfolding algorithm in The effectiveness of the 
measure Qc is demonstrated by extensive tests on both 
the artificial networks and the real world networks with 
known community structure and the application to the 
word association network. 



II. THE QUANTIFYING AND IDENTIFYING 
METHODS 

In this section, we first propose a measure Qc to quan- 
tify the overlapping community structure of networks. 
Then the overlapping community structure of a network 
is identified by partitioning a maximal clique network 
constructed from the original network using a modularity 
optimization algorithm. Finally, some discussions about 
our method are given. 



A. Quantifying the overlapping community 
structure 

As mentioned above, the overlapping community struc- 
ture can be represented as a cover of network instead of 
a partition of network. Therefore, the overlapping com- 
munity structure can be quantified through a measure of 
a cover of network. 

As well known, the modularity was used to measure 
the goodness of a partition of network. Given an un- 
weighted, undirected network G{E, V) and a partition P 
of the network G, the modularity can be formalized as 

^ ~ Yj ^ ^ ^ y ^VC^WC { ^DID j J (1) 

where A is the adjacency matrix of the network G, L — 
is the total weight of all the edges, and ky = 
J2w "^vw is the degree of the vertex v. 

In equation ([T]), Syc denotes whether the vertex v be- 
longs to the community c. The value of Syc is 1 when the 
vertex v belongs to the community c and otherwise. For 
a cover of network, however, a vertex may belong to more 
than one community. Thus dye needs to be extended to 
a belonging coefficient a„c, which reflects how much the 
vertex v belongs to the community c. 

With the belonging coefficient aye, the goodness of a 
cover C can be measured by 

Qc ^ j^'^'^ctvcawc (Ayyj Y^j- 

The idea of the belonging coefficient was proposed 
in [24]. Its authors also pointed out that the belonging 
coefficient should satisfy a normalization property. This 
property is formally written as 

0<a„c<l, VwGF,Vc6C (3) 

and 

E = 1- (4) 

Equation ([3]) and equation ([4]) only give the general 
constraints on aye, which lead to such a huge solution 
space that the enumeration of all the solutions is imprac- 
tical. To reduce the solution space and make the problem 
tractable, we introduce an additivity property for the be- 
longing coefficient: the belonging coefficient of a vertex 
to a community c is the sum of the belonging coefficients 
of the vertex to all of c's sub-communities. 

For example, we assume that C = {ci, C2, . . . , c^-i, 
} is a cover of the network G and 
C' - {ci , C2, . . . , c, — 1, Cy, Cs+1, . . . , c„} is another cover 
of G. The difference between C" and C is that the com- 
munity Cy is the union of the communities c^, . . . , Cj. The 
additivity property of belonging coefficient can then be 
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formally denoted as 

s 

avc^ = ^ OLyci- (5) 

i—r 

The belonging coefRcient a„c reflects how much a ver- 
tex V belongs to a community c. Intuitively, it is pro- 
portional to the total weight of the edges connecting the 
vertex v to the vertices in the community c, i.e., 

avc oc ^ A^u,, (6) 

w£V{c) 

where V(c) denotes the set of vertices belonging to com- 
munity c. Note that the additivity property of belonging 
coefficient requires that communities are disjoint from a 
proper view of the network. Therefore, we introduce the 
maximal clique view to achieve this purpose. We define 
a„c as the form 

^ ^ QC 

C^vc — ^ ^ A vw 7 ('7) 

where Ovw denotes the number of maximal cliques con- 
taining the edge {v, w) in the whole network, O^^ denotes 
the number of maximal cliques containing the edge (v, w) 
in the community c, and ay is a normalization term de- 
noted as 

"" = E E (8) 

Obviously, the definition in equation ([7]) satisfies the 
normalization property. It also satisfies the additivity 
property if we assume that each maximal clique only be- 
longs to one community. This assumption is reasonable 
since a maximal clique is highly connective that any two 
communities sharing a maximal clique should be com- 
bined into a single one. 

With equation ([2]) and equation ([7]) , we obtain the de- 
tailed form of Qc as a measure to the quality of a cover 
of network. Note that when a cover degrades to a par- 
tition, Qc becomes the modularity Q in ^] accordingly. 
In addition, Qc — when all vertices belong to the same 
community, and it will be shown later in section lllTl that a 
high value of Qc indicates a significant overlapping com- 
munity structure. 

B. Identifying the overlapping community 
structure 

With the measure Qc, the overlapping community 
structure of network can be identified by finding the opti- 
mal cover with maximum Qc- To find the optimal cover, 
we construct a maximal clique network from the origi- 
nal network. Then the overlapping community structure 
can be identified through partitioning the maximal clique 
network. 



1. Construction of the maximal clique network 

Given an un-weighted, undirected network G, a corre- 
sponding maximal clique network G" can be constructed 
through the following method. 

The maximal clique network G' is constructed by defin- 
ing its nodes and edges. We first find out all the maxi- 
mal cliques in G. We can simply take all these maximal 
cliques as nodes of G' . In practice, however, we observe 
that some maximal cliques would not be so highly con- 
nective if their sizes are too small. Such a maximal clique 
either lies between different communities (e.g., the max- 
imal cliques {4, 23} and {5, 22} in the network shown in 
figure [1} or connects a node to the whole network (e.g., 
the maximal clique {8, 11} in the network shown in fig- 
ure [DJa)). To deal with these small maximal cliques, we 
introduce a threshold k. Specifically, given the param- 
eter k, we only refer to those maximal cliques with the 
size no smaller than k as the maximal cliques, and refer 
to those with the size smaller than k as subordinate max- 
imal cliques. We then denote the vertices only belonging 
to subordinate maximal cliques as subordinate vertices. 
In this way, each maximal clique or subordinate vertex 
in the original network G is taken as one node of G' . 

Note that all the subordinate vertices and the maximal 
cliques form a cover C of the original network G. For a 
subordinate vertex v and a cluster c in the cover C, the 
value of avc is defined to be 1.0 when v belongs to the 
cluster c and 0.0 otherwise. As to other vertices, a^c can 
be obtained according to equation ([7]). 

Now we can define the edge of the maximal clique net- 
work G' by defining its adjacency matrix B. Let 
denote the set of the original network's vertices corre- 
sponding to the x-th node in G' . The element of B is 
defined as 

^xy ^ ' ' ^ ^vrrix^wjUy -^vw (9) 

VW 

and the strength (degree) of the x-th node 

Sx ^^^Bxy = ^^avm^kv. (10) 

y V 

For clarity, figure [2] illustrates the construction process 
of the maximal clique network from an example network 
with the parameter k — 3. Figure [^b) shows the subor- 
dinate vertices and the maximal cliques. Each of them 
becomes a node in the resulting maximal clique network. 
For example, the maximal clique {1,2,4} corresponds to 
the node a and the subordinate vertex {5} corresponds to 
the node d. Each of these maximal cliques or subordinate 
vertices is a cluster in a cover G of the original network. 
Their belonging coefficients corresponding to the cover 
G are shown in figure [2{c). According to these belonging 
coefficients and equation ([9]), the weight of each edge of 
the maximal clique network is obtained. Take the edge 
connecting the nodes a and b as an example. As known, 
the node a corresponds to the maximal clique {1,2,4} 
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(b) A cover of the original networi^ 

All the maximal cliques: All the subordinate vertices; 



a: {1,2,4} 
b: {1,3,4} 
c: {7,8,9,10} 



d:{5} 
e: {6} 
f:{ll} 
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FIG. 2: Illustration for the construction process of the maximal clique network. (a)The original example network. (b)A cover 
of the original network. In this cover, each maximal clique is a cluster and each subordinate vertex forms a cluster consisting of 
only one vertex. (c)The belonging coefficient of each vertex to its corresponding clusters in the cover, (d) The maximal clique 
network constructed from the example network. Here the parameter A; = 3. 



and the node b corresponds to the maximal clique 
{1,3,4}. Using the equation the weight of this edge 

is aiaa3b+aiaO:4b+a2aaib+a2aa4b+O!4aO:ib+O:4aa3b = 0-5 

+0.25+0.5+0.5+0.25+0.5=2.5. 

The constructed maximal clique network is a weighted 
network though the original network is un- weighted. The 
total weight L' of all the edges in the maximal clique 
network is equal to the total weight (number) L of edges 
in the original network. The proof is 

L' = ^ Bxy 

xy 

^^^^ '^vnix^wmy -^vw 
xy vw 

vw X y 

— ^ ^ -^vw 
vw 

= L. (11) 

Each vertex in the original network corresponds to 
more than one node in the maximal clique network. For 
example, in figure [21 the vertex 1 corresponds to two 
nodes a and b in the maximal clique network. Thus, a 
partition of the maximal clique network can be mapped 
to a cover of the original network, which holds the in- 
formation about the overlapping community structure of 
the original network. 



2. Finding the overlapping community structure 

Now we investigate the overlapping community struc- 
ture of the original network through partitioning its cor- 
responding maximal clique network. To find the natural 
partition of a network, the optimization of modularity is 
the widely used technique. The partition with the max- 
imum modularity is regarded as the optimal partition 
of network. We employ the algorithm proposed in (l3 | 
to partition our maximal clique network. As an exam- 
ple, figure [3] shows the partition of a maximal clique net- 
work. Different parts of the partition are differentiated 
by shapes or colors. 

As mentioned above, each partition of the maximal 
clique network corresponds to a cover of the original net- 
work and the cover tells us the overlapping community 
structure. The key problem lies in that whether the opti- 
mal partition of the maximal clique network corresponds 
to the optimal cover of the original network. To answer 
this question, we analyze the relation between the mod- 
ularity of the maximal clique network and the Qc of the 
original network. 

Let V = {pi,p2, . . . ,Pi} be a partition of the maximal 
clique network and C = {ci, C2, . . . , q} be the correspond- 
ing cover of the original network. Here, I is the size of V 
or C, i.e., the number of communities. Using modularity, 
the quality of the partition V can be measured by 
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FIG. 3: The maximal clique network constructed from the 
schematic network in figure[T] The label near each node shows 
its corresponding vertices in the original network. The width 
of line indicates the weight of the corresponding edge. The 
self-loop edge of each node is omitted and its width is reflected 
by the volume of the associated circles, squares or triangles. In 
addition, the optimal partition of the maximal clique network 
is also depicted. The communities in this partition are differ- 
entiated by shapes. Furthermore, the circle-coded community 
can be partitioned into two sub-communities. The four com- 
munities are shown in different colors, which are identical to 
the communities depicted in figure [1] Here is 4. 



Using equations Q and (|10p . we have 

i x,y^pi \ vw 

V w 



u 



WCi I -^vw 



Qc 



(13) 



Equation tells us that the optimization of the Qc 
on the original network is equivalent to the optimization 
of the modularity on the maximal clique network. Thus 
we can find the optimal cover of the original network by 
finding the optimal partition of the corresponding maxi- 
mal clique network. The optimal cover reflects the over- 
lapping community structure of the original network. 



Thus the larger the parameter fc, the less the number 
of vertices which can occur in the overlapping regions. 
When /c — > cx), the maximal clique network is identical 
to the original network and no overlap is identified. On 
the other hand, since the subordinate maximal cliques 
are not so highly connective, the parameter k should not 
be too small in practice. The choice of the parameter k 
depends on the specific networks. Observed from many 
real world networks, the typical value of k is often be- 
tween 3 and 6. Additionally, as to the networks where 
larger cliques are rare, our method is close to the tradi- 
tional modularity-based partition methods. In this case, 
rare overlaps will be found. 

Both the traditional modularity and the Qc are based 
on the significance of link density in communities com- 
pared to a null- model reference network, e.g., the con- 
figuration model network. However, differently from the 
traditional modularity which requires that each node can 
only belong to one community, Qc requires that each 
maximal clique can only belong to one community. In 
this way, Qc takes advantage of both the local topolog- 
ical structure (i.e., the maximal clique) and the global 
statistical significance of link density. 

The same to the traditional modularity, however, the 
measure Qc also suffers the resolution limit problem [l^ , 
especially when applied to large scale complex networks. 
Recently, some methods [1^ have been proposed to ad- 
dress the resolution limit problem of modularity. These 
methods are also appropriate to the measure Qc- 

Now we turn to the efficiency of our method. It is 
difficult to give an analytical form of the computational 
complexity of our method. Here we only discuss what in- 
fluences the efficiency of our method. Our method con- 
sists of three stages, finding out the maximal cliques, 
constructing the maximal clique network and partition- 
ing the maximal clique network. As to the first stage, 
we need to find out all the maximal cliques in the net- 
work. It is widely believed to be a non-polynomial prob- 
lem. However, for real world networks, finding all the 
maximal cliques is easy due to the sparseness of these 
networks. The computational complexity of the second 
stage depends on the number of edges in the original net- 
works. Finally, the partition stage rests with the number 
of the maximal cliques and subordinate vertices. Taken 
together, our method is very efficient on real world net- 
works. 



C. Discussions 

As to our method, it is important to select an appro- 
priate parameter k. On one hand, the parameter k af- 
fects the constituent of the overlapping regions between 
communities. According to the definition to subordinate 
vertices, they are excluded from the overlapping regions. 



In addition, as mentioned above, the overlapping com- 
munity structure can be identified by the optimization of 
Qc- Similarly, iteratively applying this method to each 
community, we can investigate the sub-communities cor- 
respondingly. In this way, a rigid hierarchical relation 
of overlapping communities can be identified from the 
whole network. 
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FIG. 5: The network of the karate club studied by 
Zachary [29I ]. The real social fission of this network is rep- 
resented by two different shapes, circle and square. The dif- 
ferent colors show the partition obtained by our method with 
the parameter fe = 4. 



FIG. 4: Test of our method on the benchmark networks. The 
parameter k in the legend corresponds to the parameter k in 
our method. The threshold /i = 0.5 (dashed vertical line in 
the figure) marks the border beyond which communities are 
no longer defined in the strong sense [2^, i.e., such that each 
node has more neighbors in its own community than in the 
others. Each point corresponds to an average over 100 graph 
realization. 



III. RESULTS 

In this section, we extensively test our method on 
the artificial networks and the real world networks with 
known community structure. Then we apply our method 
to a large real world complex network, which has been 
shown to possess overlapping community structure. 



A. Tests on artificial networks 

To test our method, we utilize the benchmark proposed 
in (27j . It provides benchmark networks with heteroge- 
nous distributions of node degree and community size. In 
addition, it allows for the overlaps between communities. 
This benchmark poses a much more severe test to com- 
munity detection algorithms than Newman's standard 
benchmark 6,]. There are many parameters to control 
the generated networks in this benchmark, the number 
of nodes N, the average node degree (fc), the maximum 
node degree maxk, the mixing ratio /x, the exponent of 
the power-law node degree distribution tl, the exponent 
of the power-law distribution of community size t2, the 
minimum community size mine, the maximum commu- 
nity size maxc, the number of overlapped nodes on, and 
the number of memberships of each overlapped node om. 
In our tests, we use the default parameter configuration 
where N = 1000, (fc) = 15, maxk = 50, tl ^ 2, t2 = 1, 
mic = 20, maxc — 50, on = 50 and om — 2. By tuning 
the parameter fi, we test the effectiveness of our method 
on the networks with different fuzziness of communities. 
The larger the parameter /.t, the fuzzier the community 



structure of the generated networks is. 

To evaluate the effectiveness of an algorithm for the 
identification of overlapping community structure, a mea- 
sure is needed to compare the cover found by the algo- 
rithm with the ground truth. In [23l |. a measure is pro- 
posed to compare two covers, which is an extension form 
of variation of information. The more similar two covers 
are, the higher the value of the measure is. In this pa- 
per, we adopt it to compare the overlapping community 
structure found by our method and the known overlap- 
ping community structure in the benchmark networks. 

Figure 2] shows the results of our method with k — 
4,5,6 on the benchmark networks. Our method gives 
rather good results when the ^ is smaller than 0.5. All of 
the values of the variation of information are above 0.8. 
Note that in these cases, communities are defined in the 
strong sense [2^, i.e., each node has more neighbors in its 
own community than in the others. We also test other 
settings of k which are larger than 6, and find similar 
results. 



B. Tests on real world networks 

Our first real world network for test is Zachary's karate 
club network |2§|, which is widely used as a benchmark 
for the methods of community identification. This net- 
work characterizes the social interactions between the in- 
dividuals in a karate club at an American university. A 
dispute arose between the club's administrator and its 
principal karate teacher and as a result the club even- 
tually split into two smaller clubs, centered around the 
administrator and the teacher respectively. The network 
and its fission is depicted in figure [S] The administra- 
tor and the teacher are represented by nodes 1 and 33 
respectively. 

Feeding this network into our method with the pa- 
rameter k — A, we obtain the result shown in figure [5] 
Similar to many existing community detection methods, 
our method partitions the network into four communi- 
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FIG. 6: The community structure identified by our method from the network of the bottlenose dolphins of Doubtful Sound. 
The primary split of the network is represented by different shapes, square and circle. The different colors show the partition 
obtained by our method with the parameter k — 4. 



ties. This partition corresponds to the modularity with 
the value 0.417, while the real partition into two sub- 
networks has a modularity 0.371. Actually, no vertex is 
misclassified by our method. The real split of the net- 
work can be obtained exactly by pair-wise merge of the 
four communities found by our method. 

We also note that no overlaps arc found when k — 4. 
Actually, no overlaps can be found when k is no smaller 
than 4 as to this network. Overlaps between communi- 
ties emerge when the parameter k is set to 3. The value 
of Qc corresponding to the resulting cover is 0.385 and 
in total three overlapped communities are found by our 
method. They are {1, 5, 6, 7, 11, 17}, {1, 2, 3, 4, 8, 9, 12, 
13, 14, 18, 20, 22} and {3, 9, 10, 15, 16, 19, 21, 23, 24, 
25, 26, 27, 28, 29, 30, 31, 32, 33, 34}. The overlapping 
regions consist of three vertices, being 1, 3 and 9. Each 
of them is shared by two communities. Such vertices are 
often misclassified by traditional partition-based commu- 
nity detection methods. Except the vertices occurring in 
the overlapping regions, other vertices reflects the real 
split of the network. 

We also test our method on another real world net- 
work, a social network of 62 bottlenose dolphins living 
in Doubtful Sound, New Zealand. The network was con- 
structed by Lusseau [13] with ties between dolphin pairs 
being established by observation of statistically signifi- 
cant frequent association. The network splits naturally 
into two groups, represented by the squares and circles 
in figure [S] 

By applying our method with fc = 4 to this network, 
four communities are obtained, denoted by different col- 
ors in figure [SI The green community is connected loosely 
to the other three ones. Regarding the three circle- 



denoted communities as a sole community, it and the 
green community correspond to the known division ob- 
served by Lusseau (30j. Furthermore, the three circle- 
denoted communities also correspond to a real division 
among these dolphins. The further division appears to 
have some correlation with the gender of these animals. 
The blue one consists mainly of females and the other 
two almost entirely of males. 

Alike to the Zarchay's karate network, the overlaps be- 
tween communities cannot be detected when the param- 
eter k is not less than 4. When fc = 3, overlaps between 
the circle-denoted communities emerge while the green 
community keeps almost intact. The Qc is 0.490 as to 
the resulting cover. The vertices occurring in overlap- 
ping regions are Beak, Kringel, AfiV105, Oscar, PL, 
SN4, SN9 and ri?99 among which the vertices Beak 
and Kringel are shared by all the three circle-denoted 
communities. Again these overlapping vertices are often 
misclassified by traditional partition-based methods. 



C. Application to the word association network 

Now we apply our method to a large real world complex 
network, namely the word association network. 

The data set for the word association network is from 
the demo of the software CFinder [3l|. This network 
consists of 7207 vertices and 31784 edges, and has been 
shown to possess overlapping community structure [l3]. 
It is constructed from the South Florida Free Associa- 
tion norms list (3^ . Initially, the network is a directed, 
weighted network. The weight of a directed edge from 
one word to another indicates the frequency that the 
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TABLE I: The overlapping communities around the word play. For each community, a short description is also given. The 
overlapped words are emphasized in bold texts. 



No. 


Description 


Words in each community 


1 


theater 


act actor actress bow character cinema curtsey dance director do drama entertain entertainment film guide 
involve juggler lead movie participate perform performance play portray producer production program 
scene screen show sing stage television theater 


2 


musical 
instrument 


alto band banjo bass beep blues brass bugle cello clarinet clef compose concert conductor country drum 
faddle fiddle flute guitar harp honk horn instrument jazz keyboard loud music oboe orchestra piano play 
rock saxophone symphony tenor treble trombone trumpet tuba tune viola violin woodwind 


3 


children 


adults balls children family friends fun grown-ups guardians kids love mischief nursery parents play play- 
ground play_dough prank putty toy toys tricycle 


4 


sports 


active arena athlete athletic baseball basketball black_and_white field football fun game illustrated inactive 
jock pigskin play recreation referee soccer sports stadium umpire 


5 


toys 


board boardwalk checkers chess fun game games monopoly nintendo play plaything strategy toy toys 
vcr video winning yo-yo 



-o- 



-o 



o o 



FIG. 7: Part of the hierarchy of communities extracted from 
the word association network. The dark-filled circles corre- 
spond to the five communities shown in table HI 



people in the survey associated the end point of the edge 
with its start point. These directed edges were replaced 
by undirected ones with a weight equal to the sum of 
the weights of the corresponding two oppositely directed 
edges. Furthermore, the edges with the weight less than 
0.025 were deleted. In this way, an un-weighted, undi- 
rected network is obtained, and it is the network we deal 
with. 

Applying our method to the word association network, 
we obtain in total 20 communities which overlap with 
each other. The value of the corresponding Qc is as 
high as 0.503, indicating a strong overlapping commu- 
nity structure. The size of these found communities 
are very large that there is no specific semantic mean- 
ing for each community. To investigate what is corre- 
lated to the overlapping community structure, we apply 
our method to these communities iteratively and a hier- 
archy of overlapping communities is obtained. We find 
that the sub-communities have certain correlation with 
semantic meaning of words. As an example, tableUshows 
us the communities around the word play. The five over- 



lapping communities represent different meanings of the 
word play , respectively related to theater, musical instru- 
ments, children, sports and toys. Except the common- 
shared word play, four other words are shared by some 
of these communities. They are fun, game, toy and toys. 
The overlap between these communities characterizes the 
direct, local relationship between them through sharing 
members. However, the extent of closeness between com- 
munities is sometimes reflected by the indirect, global re- 
lationship between them. One of this kind of relationship 
is the "genealogical" relationship between communities, 
which can be illustrated by the hierarchy of overlapping 
communities. Figure [7| is an example for hierarchy of 
communities. As shown in figure [3 the communities 1 
and 2 are in the same branch of the hierarchy, indicat- 
ing that the meanings represented by them are closer. 
This can be validated by examining the words contained 
in these two communities. Similarly, the communities 4 
and 5 are also closely related. However, the distance be- 
tween the communities 3 and 5 is larger although they 
share as many as 4 words. The overlaps between com- 
munities and the hierarchy of these communities provide 
us a more complete understanding to the relationship be- 
tween communities. 



IV. CONCLUSIONS 

This paper focuses on the problem of quantifying and 
identifying the overlapping community structure of net- 
works. There are two main contributions. Firstly, a mea- 
sure Qc for the quality of a cover of network is proposed 
to quantify the overlapping community structure. The 
effectiveness of the measure Qc is demonstrated by the 
experimental results that networks with significant over- 
lapping community structure have a cover with a high 
Qc- Secondly, a maximal clique network is constructed 
from the original network, and then the overlapping com- 
munity structure can be identified using any modularity 
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optimization method on the maximal clique network. 

The Qc is an extension of traditional modularity with 
the consideration that the maximal clique instead of a 
single node can only belong to one community. In this 
way, Qc takes advantage of both the local topological 
structure (i.e., the maximal clique) and the global sta- 
tistical significance of link density compared with a null- 
model reference network. In addition, Qc can be nat- 
urally used to simultaneously identify the overlapping 
and hierarchical community structure of networks. Such 
a method is helpful to more completely understand the 
functional and structural properties of networks. 

As the further work, we will consider the generalization 
to the weighted and/or directed networks. It is also an 
interesting problem about the selection of the parameter 
k in our method. We will further investigate how to 



determine an appropriate k for a given network later. 
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